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Abstract 

Recent developments in jet clustering are reviewed. We present a list 
of fast and infrared and collinear safe algorithms, and also describe 
new tools like jet areas. We show how these techniques can be applied 
to the study of underlying event or, more generally, of any background 
which can be considered distributed in a sufficiently uniform way. 



1 Recent Developments in Jet Clustering 

The final state of a high energy hadronic collision is inherently extremely complicated. Hundreds 
or even thousands of particles will be recorded by detectors at the Large Hadron Collider (LHC), 
making the task of reconstructing the original (simpler) hard event very difficult. This large 
number of particles is the product of a number of branchings and decays which follow the initial 
production of a handful of partons. Usually only a limited number of stages of this production 
process can be meaningfully described in quantitative terms, for instance by perturbation theory 
in QCD. This is why, in order to compare theory and data, the latter must first be simplified down 
to the level described by the theory. 

Jet clustering algorithms offer precisely this possibility of creating calculable observables 
from many final-state particles. This is done by clustering them into jets via a well specified 
algorithm, which usually contains one or more parameters, the most important of them being 
a "radius" R which controls the extension of the jet in the rapidity-azimuth plane. One can 
also choose a recombination scheme, which controls how partons' (or jets') four-momenta are 
combined. The choice of a jet algorithm, its parameters and the recombination scheme is called a 
jet definition [1], and must be specified in full (together with the initial particles sample) in order 
for the process 

{particles} ^ et d f^ ltlon {jets} (1) 

to be fully reproducible and the final jets to be the same. 

While (almost) any jet definition can produce sensible observables, not all of them will 
produce one which is calculable in perturbation theory. For this to be true, the jet algorithm 
must be infrared and collinear safe (IRC safe) [2], meaning that actions producing configurations 
that lead to divergences in perturbation theory, namely the emission of a very soft particle or a 
collinear splitting of a particle into two) must not produce any change in the jets returned by the 
algorithm. 
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Jet algorithm 


Type of algorithm, (distance measure) 


algorithmic complexity 


h [5,6] 
Cambridge/Aachen [7,8] 
anti-fct [10] 
SISCone [9] 


SR, d ij = mm(k 2 i ,k 2 j )AR? j /R 2 

SR, dij = ARfj/R 2 
SR, dij = min(^ 2 , fcy 2 )Ai??./i? 2 
seedless iterative cone with split-merge 


AT In AT 
ATlniV 

AT3/2 

N 2 InN 



Table 1: List of some of the IRC safe algorithms available in Fast Jet. SR stands for 'sequential recombination'. 
k t i is a transverse momentum, and the angular distance is given by Aiiy = Ay^- + A<^ . 



The importance for jet algorithms to be IRC safe had been recognized as early as 1990 in 
the 'Snowmass accord' [3], together with the need for them to be easily applicable both on the 
theoretical and the experimental side. However, many of the implementations of jet clustering 
algorithms used in the following decade and a half failed to provide these characteristics: cone- 
type algorithms were typically infrared or collinear unsafe beyond the two or three particle level 
(see [1] for a review), whereas recombination-type algorithms were usually considered too slow 
to be usable at the experimental level in hadronic collisions. 

This deadlock was finally broken by two papers, one in in 2005 [4], which made se- 
quential recombination type clustering algorithms like kt [5, 6] and Cambridge/Aachen [7, 8] 
fast, and one in 2007, which introduced SISCone [9], a cone-type algorithm which is infrared 
and collinear safe. A third paper introduced, in 2008, the anti-fc t algorithm [10], a fast, IRC 
safe recombination-type algorithm which however behaves, for many practical purposes, like a 
nearly-perfect cone. This set of algorithms (see Table [T]), all available through the Fast Jet 
package [11], allows one to replace most of the unsafe algorithms still in use with fast and IRC 
safe ones, while retaining their main characteristics (for instance, the MidPoint and the ATLAS 
cone could be replaced by SISCone, and the CMS cone could be replaced by anti-fct). 

2 Jet Areas 

A by-product of the speed and the infrared safety of the new algorithms (or new implementations 
of older algorithms) was found to be the possibility to define in a practical way the area of a jet, 
which measures its susceptibility to be contaminated by a uniformly distributed background of 
soft particles in a given event. 

In their most modest incarnation, jet areas can be used to visualize the outline of the jets 
returned by an algorithm so as to appreciate, for instance, if it returns regular ("conical") jets or 
rather ragged ones. An example is given in Fig.[T] 

Jet areas are amenable, to some extent, to analytic treatments [12], or can be measured 
numerically with the tools provided by Fast Jet. These analyses disprove the common as- 
sumption that all cone-type algorithms have areas equal to -kR 2 . In fact, depending on exactly 
which type of cone algorithm one considers, its areas can differ, even substantially so, from this 
naive estimate: for instance, the area of a SISCone jet made of a single hard particle immersed in 
a background of many soft particles is irR 2 /4 (this little catchment area can explain why other 
iterative cone algorithms with a split-merge step, like the MidPoint algorithm in use at CDF, 




Fig. 1: Typical jet outlines returned by four different IRC safe jet clustering algorithms. From [10]. 



have often been seen to fare 'well' in noisy environments). One can analyse next the k t and the 
Cambridge/Aachen algorithms, and see that their single-hard-particle areas turn out to be roughly 
0.81vri? 2 . Finally, this area for the anti-A; t algorithm is instead exactly ttR 2 . This fact, together 
with its regular contours shown in Fig. [T] explains why it is usually considered to behave like a 
'perfect cone'. 

Jet areas also allow one to use some jet algorithms as tools to measure the level of a 
sufficiently uniform background which accompanies harder events. This can be accomplished 
by following the procedure outlined in [13]: for each event, all particles are clustered into jets 
using either the kt or the Cambridge/Aachen algorithms, and the transverse momentum ptj and 
the area Aj of each jet are calculated. One observes that a few hard jets have large values of 
transverse momentum divided by area, whereas most of the other, softer jets have smaller (and 
similar) values of this ratio. The background level p, transverse momentum per unit area in the 
rapidity-azimuth plane, is then obtained as 

p = median < > . (2) 
I A 3 J jell 

The range 1Z should be the largest possible region of the rapidity-azimuth plane over which the 
background is expected to be constant. 

The operation of taking the median of the {p t j/Aj} distribution is, to some extent, arbi- 
trary. It has been found to give sensible results, provided that the range 1Z contains sufficiently 
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Fig. 2: Determination of the background level p of a toy-model random underlying event, as a function of the radius 
parameter R. Each point is the result of averaging over many different realizations. The parameters have been adjusted 
to roughly reproduce the situation expected at the LHC. 

many soft background jets - at least about ten (twenty) of them, if only one (two) harder jets are 
also present in TZ, are usually enough [14]. 

3 Underlying Event Studies 

To a certain extent, and within certain limits, the background to a hard collision created by the 
soft particles of the underlying event (EU) can be considered fairly uniform. It becomes then 
amenable to be studied with the technique introduced in the previous Section. This constitutes 
an alternative to the usual and widespread approach (see for instance [15, 16]) of triggering on a 
leading jet, and selecting the two regions in the azimuth space which are transverse to its direction 
and to that of the recoil jet. These two regions are considered to be little affected by hard radiation 
(in the least energetic of them it is expected to be suppressed by at least two powers of a s ), and 
therefore one can expect to be able to measure the UE level there. 

This way of selecting the UE can be considered a topological one: particles (or jets) are 
classified as belonging to the UE or not as a result of their position. On the other hand, the 
median procedure described in the previous Section can be thought of as a dynamical selection: 
no a priori hypotheses are made and, in a way that changes from one event to another, a jet is 
automatically classified as belonging to the hard event or to the background as a result of its char- 
acteristics (namely the value of the p t j/Aj ratio). One can further show that this selection pushes 
the possible contamination from perturbative radiation to very large powers of ot s : for a range 1Z 
defined by \y\ < y rna x, perturbative contamination will only start at order n ~ 3y maa; /i? 2 [13]. 




Fig. 3: Determination of the background level p in realistic dijet events at the LHC, with (right) and without (left) 
pileup. Preliminary results. 



This gives n ~ 24 for y m ax = 2 and R = 0.5, suggesting that the perturbative contribution is 
minimal. 

A sensible criticism of this procedure is that the UE distribution is not necessarily uniform, 
and may for instance vary as a function of rapidity. A way around this is then to choose smaller 
ranges, located at different rapidity values, and repeat the p determination in each of them. Of 
course care will have to be taken that the chosen ranges remain large enough to satisfy the cri- 
terion on the number of soft jets versus hard ones given in the previous Section: for instance, a 
range one unit of rapidity large can be expected to contain roughly 27r/(0.557ri? 2 ) ~ 15 soft jets 
for R = 0.5, which makes it marginally apt to the taskQ 

A final word should be spent on which values of the radius parameter R can be considered 
appropriate for this analysis. Roughly speaking, R should be large enough for the number of 
'real' jets (i.e. containing real particles) to be at last larger than the number of 'empty jets' 
(regions of the rapidity-azimuth plane void of particles, and not occupied by any 'real' jet). It 
should also be small enough to avoid having too many jets containing too many hard particles. 
Analytical estimates [13] and empirical evidence show that for UE estimation in typical LHC 
conditions one can expect values of the order of 0.5 - 0.6 to be appropriate. Much smaller values 
will return p ~ 0, while larger values will tend to return progressively larger values of p, as a 
result of the increasing contamination from the hard jets. Fig. [2] shows results obtained with a 
toy model where 100 soft particles with ~ 1 GeV are generated in a \y\ < 4 region. Ten 
hard particles, with pj? rd ~ 100 GeV, can be additionally generated in the same region. One 
observes how, after a threshold value for R, p is estimated correctly for the soft-only case, while 
when hard particles are present they increasingly contaminate the estimate of the background. 

The same analysis can be performed on more realistic events, generated by Monte Carlo 
simulations. Fig. [3] shows the determination of p in a simulated dijet event at the LHC, with 
and without pileup. In both cases the general structure of the toy-model in Fig. [2] can be seen, 
though it is worth noting that in the UE case (left plot) the slope can vary significantly from event 

'its performance can be improved by removing the hardest jets it contains from the {pt,j/Aj} list before taking 
the median [14]. 




Fig. 4: Distributions of p from the UE over many simulated LHC dijet events (pr > 50 GeV, \y\ < 4), using different 
Monte Carlos and different UE tunes. Preliminary results. 



to event, and also according to the Monte Carlo tune used [14]. The larger particle density (and 
probably higher uniformity) of the pileup case allows for an easier and more stable determination. 

Once a procedure for determining p is available, one can think of many different appli- 
cations. One possibility is of course to tune Monte Carlo models to real data by comparing 
rho distributions, correlations, etc. A preliminary example is given in fig. |4| where studying 
the distribution of p can be seen to allow one to discriminate between UE models which would 
otherwise give similar values for the average contribution (p). More extensive studies are in 
progress [14]. 

Yet another use of measured p values is the subtraction of the background from the trans- 
verse momentum of hard jets. Ref. [13] proposed to correct the four-momentum p^j of the jet 
j by an amount proportional to p and to the area of the jet itself (the susceptibility of the jet to 
contamination): 

where A^j is a four-dimensional generalization of the concept of jet area, normalized in such a 
way that its transverse component coincides, for small jets, with the scalar area Aj [12]. One 
can show [13, 17] that such subtraction of the underlying event can improve in a non-negligible 
way the reconstruction of mass peaks even at very large energy scales. A similar procedure is 
also being considered [18] for heavy ion collisions, where the background can contribute a huge 
contamination, even larger than the transverse momentum of the hard jet itself (partly because 



of this, one usually speaks of 'jet reconstruction' in this context, rather than just 'subtraction'). 
Initial versions of this technique have already been employed at the experimental level by the 
STAR Collaboration at RHIC in [19,20], where IRC safe jets have been reconstructed for the 
first time in heavy ion collisions. 

4 Conclusions 

Since 2005 numerous developments have intervened in jet physics. A number of fast and infrared 
and collinear safe algorithms are now available, allowing for great flexibility in analyses. Tools 
have been developed and practically implemented to calculate jet areas, and these can used to 
study various types of backgrounds (underlying event, pileup, heavy ions background) and also 
to subtract their contribution to large transverse-momentum jets. 

These new algorithms and methods (as well as the ones not mentioned in this talk, like 
the many approaches to jet substructure, see e.g. [21-25], useful in a number of new-physics 
searches) are transforming jet physics from being just a procedure to obtain calculable observ- 
ables to providing a full array of precision tools with which to probe efficiently the complex final 
states of high energy collisions. 
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